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Abstract According to Fortunato and Barthelemy, modularity-based community detection algorithms have 
a resolution threshold such that small communities in a large network are invisible. Here we generalize 
their work and show that the (/-state Potts community detection method introduced by Reichardt and 
Bornholdt also has a resolution threshold. The model contains a parameter by which this threshold can be 
tuned, but no a priori principle is known to select the proper value. Single global optimization criteria do 
not seem capable for detecting all communities if their size distribution is broad. 



PACS. 89.75.-k Complex systems 
organization in complex systems - 



89. 75. He Networks and genealogical trees 
i9.65.-s Social and economic systems 



89.75.Fb Structures and 



1 Introduction 



Networks are an efficient way to represent a variety of 
complex systems, including technological, biological and 
social systems |l|2j . Many networks have substructures 
called communities, which are, loosely speaking, groups 
of nodes that are densely interconnected but only sparsely 
connected with the rest of the network [3|4|5|6] . Detecting 
such communities is of interest, because they may provide 
valuable information of the substructure and functionality 
of the network, e.g., functional modules in metabolic net- 
works, communities of individuals interacting with each 
other, etc. This analysis can also be extended to more 
complex properties, including networks of communities 
[7] , roles of nodes inside and between communities [6] , and 
the effect of communities on the dynamics of for example 
information flow through the network [8]. 

A large number of algorithms have been developed for 
detecting the communities, for reviews see |9ll0j . A par- 
ticularly popular method is based on the concept of mod- 
ularity Q introduced by Newman and Girvan 



(i) 



where e rs is the fraction of links that fall between nodes 
in communities r and s and a s = ^ r e rs . Detecting com- 
munities is equivalent to optimizing the modularity of the 
network, where optimization is computationally demand- 
ing, especially for large networks, but solvable with vari- 
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ous approximate methods |12|13|ll|14ll5j . Modularity op- 
timization has been shown to perform well for many test 
networks [9116] . 

Recently, Fortunato and Barthelemy showed that mod- 
ularity optimization fails to find small communities in 
large networks, indicating that it is favorable to com- 
bine small communities into larger ones |17j . In a net- 
work which has L links, there is a characteristic number 
of links, such that communities with less than w 'L/2 links 
are not visible. Earlier Reichardt and Bornholdt (RB) had 
introduced a general framework for community detection, 
which includes the modularity optimization as a special 
case |18|19j . Starting from a q— state Potts Hamiltonian, 
they show that community detection can be interpreted 
as finding the ground state of an infinite-range spin-glass. 
Potts spins are assigned to the nodes of the network and 
the communities can be identified as clusters of aligned 
spins in the ground state. The model is based on a com- 
parison of the investigated network to a null model which 
can be arbitrarily chosen. In addition, the method contains 
a tunable parameter 7 for detecting community structures 
at different hierarchical levels. The Newman-Girvan mod- 
ularity optimization method is a special case in this gen- 
eral framework, where the null model is the configuration 
model [5D] and 7 = 1. The question arrises whether the 
more general RB spin-glass-based community detection 
method is able to overcome the limitations of the modu- 
larity optimization. Our paper addresses this question. 

We analyze the effect of 7 on community detection and 
consider how to design a network with optimal community 
structure, study the resolution limit and its estimates by 
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using a general null model, and, finally, demonstrate the 
consequences of our findings in certain example cases. 

2 Optimal number of communities in the RB 
model 

For detecting the communities in a network, Reichardt 
and Bornholdt proposed the following Hamiltonian: 

n = ~Yl ( A v ~ 1Pt i ) S ( <7 *> CJ j)> ( 2 ) 

where Ay denotes the adjacency matrix of the graph with 
Aij = 1 if an edge is present and zero otherwise, ai € 
{1,2, ... ,q} denotes the group index of node i, 7 is a 
parameter of the model, and py denotes the link prob- 
ability between nodes i and j according to the null model. 
The null model reflects the connection probability between 
nodes in a network having no apparent community struc- 
ture. Possible choices for the null model are, for example, 
Pij = p and p^ = j^kikj, where fcj is the degree of node 
i and L is the number of links in the network. The for- 
mer null model corresponds to the Erdos-Renyi network 
[21], whereas the latter one is closely related to the con- 
figuration model. The Hamiltonian @ rewards existing 
links inside communities, but the reward is reduced if py 
is large. Furthermore, the penalty of a missing link inside 
a community is proportional to its probability. The mod- 
ularity Q of Eq. (]} is related to Eq. © as Q = -H/L, 
provided that 7 = 1 and py = jj-kikj. 

In order to gain some insight to the model given by 
Eq. ([21), we consider two limits of 7. First, when 7 — > 
each link inside a community comes as a "surprise" , while 
the missing links are not increasing the energy as they 
are not expected to exist. Thus, in the limit 7 = the 
minimum energy is obtained when all nodes are assigned 
into the same community, and the minimum energy is 
TL = —2L. Second, when 7 3> 1 communities are bro- 
ken into smaller pieces because the penalty from missing 
links is large and all existing links are considered to be ex- 
tremely likely. When 7 exceeds the inverse of the minimum 
of non-zero Py:s, the terms Ay — 7Py in ([2]) become all 
negative, and the minimum energy is obtained when each 
node is regarded as a separate community, resulting in 
7i = 0. This demonstrates that for small values of 7, one 
can expect to find large community structures, whereas 
for large values of 7 only small community structures are 
found. The total amount of energy that can possibly be 
contributed by links and non-links is equal for 7 = 1, 
which can be regarded as a natural choice. Later we show, 
however, that optimizing the energy with 7 = 1 does not 
necessarily yield the obvious and most natural community 
structure even in a simple test case. 

Following the steps in [T7], we next consider how to 
design a connected network with N nodes and L links 
such that the energy ([2]) is minimized. In particular, we 
are interested in the optimal number of communities as a 
function of L and 7. Therefore, we study a network which 
has n fully connected subgraphs (or cliques) of equal size, 



being interconnected with h links and arranged in a ring- 
like structure, see Fig.[lJA). This network has by construc- 
tion h communities, namely the cliques, i.e., the links in- 
side the cliques are intra-community, while those connect- 
ing them are inter-community links. The minimization of 
J2]) should reflect this structure providing the n equal size 
communities. Moreover, for such an obvious structure this 
result should be robust against changing 7 or even the null 
model. 

Equation @ can be rewritten as 

n 

w = -E( /s -*)> ( 3 ) 

s=l 

where I s is the number of links inside community s and 
[Z]* is the expected number of links in that community 
given the link distribution p^ and the current assignment 
of nodes into communities [19j . In order to be compat- 
ible with the calculations in [T7], we choose first to use 
Pij = j^kikj, i.e., our reference system is the configuration 
model. In this case, [l]p.. = j^K^, where K s is the sum 
of degrees of nodes in community s. It is straightforward 
to show that Eq. Q is minimized when each community 
has L/n — 1 links. Then, the energy is 

W min (n,7,£) = -{L - n - 7— ). (4) 

n 

The optimal number of communities, n* , is obtained as the 
zero of the derivative dH. m i n (n, 7, L)/dn. This yields n* = 
>/jL, which in turn gives back the result of [17j for 7=1. 
If the null model is py = p, i.e., an Erdos-Renyi graph, 
a similar calculation shows that the energy minimum is 
obtained when each community has an equal number of 
nodes. In this case, the optimal number of communities is 

Let us suppose that, given N and L, we have con- 
structed a ring-like network as described above, having 
more than \AyL cliques. Previous analysis shows, coun- 
terintuitively, that when each clique is considered as a 
separate community the energy (J3]) is not minimized. In- 
stead, it is better to relabel the communities so that small 
communities are merged to form larger ones. On the other 
hand, if the number of communities is much smaller than 
\/"fL it might be advantageous to split large communi- 
ties into smaller ones. Therefore, the original, well defined 
communities are not necessarily found by optimizing the 
quality function ((3|). In particular, small communities may 
remain unresolved. 



3 Resolution threshold with a general null 
model 

The previous section suggests that the most common null 
models, py = j^kikj andpy = p, lead to merging of small 
communities in large networks. In this section we investi- 
gate the case of a general null model and the effect of 7 
on the resolution. Hence, we consider a general undirected, 
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Figure 1. A: a ring-like network of n cliques joined by n links. 
B: Consecutive cliques can be merged to form larger commu- 
nities. The optimal configuration depends on the network pa- 
rameters and 7. 



unweighted network having N nodes and L links. Let us 
suppose that the nodes have somehow been assigned to 
communities. We take two communities, labeled s and r, 
each having I s and V links inside and Z swr links between 
them. The question is, when should the communities be 
merged? At first, the energy ^ reads as follows 

£1 = E H* + iK i4 ) + H r + 7[i] r Pij ) + H s + i[i] s Pij ) 

t^s,r 

whereas after combining the communities the energy is 



-(r + i s + i s ~ r ) + 7 [ 



ls+r 
Ipij 



where 



]s+r : c 



But [ly+r-ii^-iiy^ 



'Pi, 



(6) 

J( , , is the expected number of links in the com- 
bined community and / swr is the number of links between 
the communities s and r. The communities should be com- 
bined if 

ae = e 2 -e 1 = -i>»* + 7 ([iy+; - [ir Pi . - [iy Pi .) < o. 

(7) 

is the expected number of 
links between the communities and equation (JT]) reduces 
to 

i[i] P 7 3 r < ls ~ r - (8) 

As the communities have n s and n r nodes each, the max- 
imum number of links between the communities is n s n r . 
In a large network, the average probability for connecting 
two nodes has to be of the order of AT -1 , regardless of the 
null model. Therefore, the expected number of links be- 
tween the communities, [^]p*^ r , is on average of the order 
of n s n r /N. Using this estimate in Eq. {8]| suggests that 
even a single link between small communities may trigger 
merging if the communities are small, i.e., n s ,n r <C N. In 
particular, communities of approximately the same size 
are merged if 



< 



v/atF^Tt". 



(9) 



Eq. ([9]), we obtain that it is beneficial to combine com- 
munities smaller than ~ y/N/j. This is the lower limit 
for the community size that the method is able to de- 
tect. Large values of 7 decrease this resolution threshold, 
but rather inefficiently. When the communities are more 
densely interconnected, the resolution threshold increases. 
In the extreme (unphysical) limit, when the communities 
are connected with Z swr ~ L links, Eq. ([9]) indicates that 
even communities whose size is comparable to the whole 
network may remain unresolved. Similar results for the 
resolution thresholds were obtained in Ref. [17] for the case 
7=1, pij = kikj/2L: Two tightly connected communities 
may be merged if each has less than L/4 links, whereas 
the lower limit is \J L/2 for communities connected with 
a single link. 

The community structure found by the RB model cor- 
responds to the global minimum of (J3J). It should be noted 
that the previous calculations do not prove that the par- 
ticular communities s and r will be in the same commu- 
nity for the global minimum. The calculations show, how- 
ever, that the global minimum does not contain connected 
communities smaller than the above mentioned size lim- 
its, because by combining them a lower energy would be 
achieved. 

Equation |8]) shows also that cliques are stable against 
splitting for any reasonable 7. Suppose that a clique is 
split into two parts each having n s and n r nodes. The 
parts have the maximum number l s< ~^ r = n s n r of con- 
necting links. Substituting this and [Z]p*~* r ~ n s n r /N into 
Eq. ((8]) shows that it is beneficial to split a clique only 
when 7 ~ N. Such high value of 7 does not, however, 
make sense because it would lead to splitting the network 
into individual nodes for the following reason. In this case 
the average value of links from a node according to the 
null model would exceed the maximum possible number 
of links from a node, and the communities would be split 
into individual nodes. We conclude that when 7 <§; A" 
cliques and almost complete cliques are not split. 



4 Examples 

We illustrate the consequences of the above results in three 
example cases. Let us first consider the simplest possi- 
ble case of community detection [17]: the network con- 
sists of a ring of complete cliques joined by single links, 
Fig. [lj A) . There are n cliques and each clique has m nodes 
and m(m — l)/2 links. Figure QjB) shows a case where r 
consecutive cliques are merged to form a single commu- 
nity. A straighforward calculation shows that in this case, 
the energy is given by 



Now, let us suppose the communities are loosely connected 
to each other, that is, I s — ~ 1. When this is applied in 



m(m — 1) r — 1\ rn „ 2 
2 + ) + ^^l m O - !) > 

(10) 

when pij = j^kikj. By joining cliques, we get a "bonus" 
from the links joining the cliques, i.e., term (r — l)/r, but 
in large communities the expected number of links inside 
the communities is increasing faster than in small com- 
munities. Thus, for small 7 the merged cliques have low 
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Figure 2. Energy (I10p as a function of 7 for a system where 
n = 60, m = 6 and r — 1, . . . , 6. The optimal configuration 
depends on 7 and the natural communities are found only when 
7 > 1.875, c.f. Eq. (fTi). 



energy, but as 7 increases the energy is growing quite fast 
as illustrated in Fig. [2] The optimal configuration found 
by optimizing Eq. |2]) is the configuration that has the 
lowest energy for the given values of n, m and 7. Espe- 
cially, it can be shown that the natural communities are 
found only if 

m(m-l) + 2>-. (11) 

7 

When the link probability is Py = p, we obtain the same 
result with a correction term of the order of (7m) . 

Our second example is a random network, which has 
often been used as a test network for community detection 
algorithms [11] . The network consists of n communities 
each having m nodes. Each node has on average (k) links 
of which (ki n ) go to random nodes in the same group and 
(k ut) = (k) — (ki n ) links lead randomly to nodes in other 
communities. Let us now calculate when, on the average, 
it is beneficial to merge two designed communities. We 
obtain that the average number of observed links between 
the communities is 

(f'~ r > = -^<fcout>, (12) 
n — 1 

where the averaging is done over all the realizations of the 
network. Note that if m/(n — l)(k ou t) < 1 we have to set 
(l s " r ) = 1 because we are considering only communities 
which are connected by at least one link. The null model 
is again pij = j^kikj. According to the null model the 
expected number of links between communities is 

K7 - ^(m(k)f = (13) 

when averaged over the realizations of the network. Now 
Eqs. ([8]), (fT2|) and (fT3|) give that the communities are 




Figure 3. An example of the effect of network size on the 
resolution of the Potts method. Symbols correspond to com- 
munities. See text for details. 



merged if 

iovn>l + m(k out ). 1 > 

For typical values n = 4, m = 32, (k) = 16 and (k out ) = 
1 ... 8 we find that 7 = 1 should give the correct commu- 
nities. Thus, it is not surprising that community detection 
based modularity optimization {1} performs well for this 
network. We point out that it is possible to choose the 
parameters n, m, (k) and (k out ) in such a way that modu- 
larity optimization with 7=1 does not give the designed 
communities. 

As a third example we note that the Potts Hamiltonian 
([2]) can be generalized to weighted networks by using a 
weighted adjacency matrix Wij. A simple way to do this 
is to define 

H w = -/](Wjj - jWiJpij)S(ai,aj), (15) 

where Wij is the average link weight. In this way, strong 
links inside communities lower the energy greatly, while 
missing links are assumed to be of average weight. Using 
weights does not, however, resolve the underlying problem 
that in a large network even a single link easily exceeds 
the expected weight between the communities. 

Finally, in Figure [3] we demonstrate the effect of net- 
work size on the resolution of the Potts method. Panel 
a) shows a network of four groups of 10 nodes. We have 
compared the energies (3) for two community divisions 
using 7=1 and the configuration null model. E\ = 
is the energy for the case when all four groups are as- 
signed to a single community, whereas E4 — —100.2 is the 
energy when the four groups are each assigned to a differ- 
ent community. In this case, E4 < E\, i.e. the groups are 
properly identified as communities. However, if the origi- 
nal network of panel a) is modified such that an additional 
60-clique community is connected to it via a single link, 
the situation is changed. All nodes of this new 60-clique 
are assigned to a single community. Now, E[ = —271.17 
is the energy when the original four groups are merged 
into a Potts community, and E' 4 = —269.73 the energy 
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when they are assigned to separate communities. Hence 
E[ < E[, i.e., the energy for merged groups is lower. This 
is unphysical, since connecting the new clique via a single 
link does not alter the original four-group topology. 

5 Conclusions 

In the light of the above considerations it is clear that 
the problem of the resolution limit is not restricted to 
the Newman-Girvan method of modularity optimization. 
Rather, it is a flaw which seems to be present in any com- 
munity detection scheme based on global optimization of 
intra- and extra-community links and on a comparison 
to any null model. The limited resolution rises from the 
fact that in a large network the expected number of links 
between two small sets of nodes is small and even a sin- 
gle link between the sets is enough to merge them. The 
null model uses the global probability of connecting nodes 
while the small communities should be considered on a 
more local level. We agree with the conclusion of Ref. [T7] 
that presently, in large networks, local community detec- 
tion methods like J| seem to perform better from the point 
of view of resolution. An alternative solution to this prob- 
lem could be to iteratively change the parameter 7 when 
looking for smaller communities in a large network. 

Our results indicate that when the community struc- 
ture is not known beforehand, there is no simple way to 
decide which 7 gives the most relevant communities. More- 
over, if the size distribution of the communities is broad, 
like in collaboration networks [6] or school friendship net- 
works [22] , there is no single proper value of 7 for the op- 
timal resolution. The hierarchical structure can be exam- 
ined to some extent by using several values of 7 [19], but 
this method may find too much hierarchy in the network 
as it tends to artificially merge communities. Because of 
this tendency, one should always carefully investigate the 
structure of the found communities. 
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